arXiv:l504.05199vl [astro-ph.SR] 20 Apr 2015 


Draft version April 22, 2015 

Preprint typeset using DTgX style emulateapj v. 12/16/11 


K2P2 — A PHOTOMETRY PIPELINE EOR THE K2 MISSION 

Mikkel N. Lund'"^*, Rasmus Handberg*’^, 

Guy R. Davies^'', William J. Chaplin^’’, and Caitlin D. Jones^-’ 

'Stellar Astrophysics Centre (SAC), Department of Physics and Astronomy, Aarhus University, 

Ny Munkegade 120, DK-8000 Aarhus C, Denmark; *mikkelnl@phys.au.dk and 
^School of Physics and Astronomy, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK 

Draft version April 22, 2015 

Abstract 

With the loss of a second reaction wheel, resulting in the inability to point continuously and stably at the 
same field of view, the NASA Kepler satellite recently entered a new mode of observation known as the K2 
mission. The data from this redesigned mission present a specific challenge; the targets systematically drift in 
position on a ~6 hour time scale, inducing a significant instrumental signal in the photometric time series — 
this greatly impacts the ability to detect planetary signals and perform asteroseismic analysis. Here we detail 
our version of a reduction pipeline for K2 target pixel data, which automatically; defines masks for all targets in 
a given frame; extracts the target’s flux- and position time series; corrects the time series based on the apparent 
movement on the CCD (either in ID or 2D) combined with the correction of instrumental and/or planetary 
signals via the KASOC filter (Handberg & Lund 2014), thus rendering the time series ready for asteroseismic 
analysis; computes power spectra for all targets, and identifies potential contaminations between targets. From 
a test of our pipeline on a sample of targets from the K2 campaign 0, the recovery of data for multiple targets 
increases the amount of potential light curves by a factor of >10. 

Our pipeline could be applied to the upcoming TESS (Ricker et al. 2014) and PLATO 2.0 (Rauer et al. 2013) 
missions. 

Keywords: asteroseismology — methods: data analysis — techniques: photometric — techniques: image 
processing — stars: solar-type 

the steps taken in our light curve construction, starting from 
raw Target Pixel Files (TPF) and going to the dehnition of 
pixel masks and extraction of target positions and light curves. 
Section 3 pertains to the correction of the light curves from the 
time dependent movement on the CCD; here we describe both 
our version of the ID self flat-helding introduced by Vander- 
burg & Johnson (2014) in Section 3.1, and our suggestion for 
a 2D approach in Section 3.2. In Section 4 we present results 
from a test of our pipeline on a target sample during CO, and 
conclude in Section 5. 

2. LIGHT CURVE CONSTRUCTION 

The nominal Kepler mission delivered a pixel aperture (a 
mask) where the chosen pixels optimised the mean signal- 
to-noise ratio (S/N) based on estimates of the pixel response 
function (PRF) and information from the KIC (Bryson et al. 
2010; Jenkins et al. 2010). This mask could be used to con¬ 
struct custom masks by adding or removing pixels to the 
starting mask based, for example, on the amount of flux in 
the pixels. This procedure was adopted in the KASOC filter 
pipeline (Handberg & Lund 2014) using the routine developed 
by Mathur et al. (in preparation). Masks are no longer deliv¬ 
ered, at least not for the data releases made to date, which calls 
for a new method to define pixel masks. Masks constructed 
from ranking pixels in order of their S/N, and then including 
the number of pixels which optimises, for instance, the com¬ 
bined differential photometric precision (CDPP) noise metric 
(Gilliland et al. 2011; Christiansen et al. 2012) or the mean 
S/N could run into problems if signals from other stars are 
not removed; this is especially difficult if there are secondary 
objects in close proximity to the primary target. 

In the following we describe our pipeline for the con- 


1. INTRODUCTION 

K2 (Howell et al. 2014) is the continuation of the nominal 
NASA Kepler mission (Borucki et al. 2010; Gilliland et al. 
2010b) which ended with the loss of a second reaction wheel 
in May 2013. The stability solution for the Kepler satellite is 
to balance in an unstable equilibrium against the Solar pho¬ 
ton pressure and correct rolls with thruster brings, while pitch 
and yaw is controlled by the two remaining reaction wheels; 
this strategy allows for observations in fields along the eclip¬ 
tic plane, with an observing length per field of close to 80 
days. This time span is known as a “Campaign” (C), and is 
the analogue to the 3 month “Quarters” (Q) used in the nomi¬ 
nal Kepler mission. In the nominal mission targets were des¬ 
ignated using a Kepler Input Catalogue (KIC) number, which 
has now been replaced by the Ecliptic Plane Input Catalogue 
(EPIC) number. 

The systematic pointing drift in the K2 observations, from 
the adopted stabilisation of the spacecraft, calls for new light 
curve correction methods. One such has recently been pro¬ 
posed by Vanderburg & Johnson (2014), and use the positions 
on the CCD as a function of time to decorrelate the induced 
variations in the light curve. The larger fields around targets 
in K2 — needed to account for the apparent movement of 
the target on the CCD — and the increased crowding from 
pointing toward the ecliptic means that often many stars are 
found in a given frame. This, combined with the potential 
lack of aperture masks from the Kepler team, necessitates the 
development of new methods to extract the flux and position 
of targets from custom apertures, and this in an efficient and 
robust manner. 

The paper is structured as follows; In Section 2 we describe 
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Time (TBJD) 

Figure 1. Kernel distribution of flux within the pixel frame as a function of 
time during CO (here for EPIC 202062417). The color scale goes from light 
for a low flux level to dark for a high flux level; the red line indicates the 
distribution mode. Times with quality flags indicating contamination of any 
sort have been excluded (Fraquelli & Thompson 2012). The presence if a 
ghost image of Jupiter elevates the background flux between 56728 - 56788 
TBJD, with high flux-spikes at the beginning and end of this interval where 
Jupiter enters and exist the focal plane and making specular reflections. 
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Figure 2. Summed image from short-cadence CO data of EPIC 202062417, 
with the background mode (see Figure 1) subtracted from individual frames. 
Colour scale is on a logarithmic scale going from light (low flux) to dark blue 
(high flux), and negative levels are truncated to 0. Left: Summed image with 
frames having a bad quality flag removed. Right: Summed image with all 
frames included; here are ghost image of the brightest targets appear shifted 
approximately two pixels up and five pixels to the right. 


late this component and separate it from trends caused by the 
degraded attitude. 


struction of light curves, called K2P^ (K2-Pixel-Photometry), 
which delivers both the position and flux for all the objects 
in the delivered frames. The stellar position as a function of 
time is used to Alter the light curve from variations in flux 
induced by the movement of the stars over different pixels 
which have varying sensitivities (see Section 3). We define 
fixed masks from a summed image (see Section 2.2), which is 
large enough to encompass the stellar movement on the CCD. 
We go through the different steps in the K2P^ pipeline below. 
In all examples times will be in truncated Barycentric Julian 
Date (TBJD)' given as BJD - 2400000. 

2.1. Background estimation 

As the initial step of K2P^ we estimate the sky background 
as a function of time, because this contribution is unaccounted 
for in the flux from the raw K2 target pixel data. For each time 
step we calculate the mode of the flux kernel density estima¬ 
tion (using Scott’s (Scott 1979) rule for setting the bin width) 
from all pixels as the maximum likelihood estimator for the 
sky background. We thus assume a uniform background flux 
across a given image. 

The sky background level is far from constant, but increases 
gradually (by around ~25% in CO) over the course of a cam¬ 
paign; a typical example can be seen in Figure 1 . In CO the 
background level was further increased for many channels by 
the antipodal ghost image of Jupiter as it fell on one of Ke¬ 
pler’s dead modules^. The change in background levels can 
largely be attributed to changing levels of stray light enter¬ 
ing the photometer from the change in angle between the Sun 
and the photometer, and is thus additive. Secondary changes 
might come from changes in focus as the heating of the space¬ 
craft varies. 

If the background level variation is unaccounted for it will 
appear in the extracted light curve; it is preferential to iso- 

* this differs from the Barycentric Kepler Julian Data (BKJD) given as 
BJD - 2454833. 

^ http: //keplerscience. arc.nasa. gov/K2/CSdrn. shtml 


2.2. Summed image 

For setting pixel masks we create a summed image. Here, 
frames are co-added after first having subtracted the corre¬ 
sponding sky background levels (see Section 2.1). We make 
use of the quality flags available in the pixel data fits files 
(Fraquelli & Thompson 2012), and ignore all frames with a 
flag indicating any non-optimal data. The effect of neglecting 
this is illustrated in Figure 2. Including frames with bad qual¬ 
ity flags, for instance when reaction wheel momentum dumps 
are made, results in the creation of a shifted ghost image. If a 
summed image including a shifted ghost image would be used 
in setting masks, these would be much larger than needed and 
would essentially only add noise for the majority of the time 
series. It would also be difficult for an automatic routine, that 
can separate close targets, to identify the ghost image as be¬ 
longing to the main target rather than being a target in its own. 


2.3. Pixel mask selection 

To Ax the masks we first select which pixels can be in¬ 
cluded in a mask by setting a flux threshold. The threshold 
is obtained as the median absolute deviation (MAD) of the 
summed image flux distribution which falls to the left-hand- 
side of the mode of the distribution. Only the left-hand side 
of the distribution is used as the right-hand side is influenced 
more strongly by the stellar flux. 

On the pixels with flux levels above the threshold we run 
an unsupervised clustering algorithm to locate targets in the 
frame and set individual masks for these. Specifically, we 
use the density-based spatial clustering of applications with 
noise routine (DBSCAN; Ester et al. 1996) as implemented in 
the Python-based library Scikit-learn^ (see Pedregosa et al. 
2011). DBSCAN only takes two input parameters: a neigh¬ 
bourhood radius r^, and a minimum number of points needed 
to form a cluster Nmin- Given the regularity of the pixel grid, 

^ http://scikit-learn.org/ 
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Figure 3. Illustration of the working principle of the DBSCAN clustering 
algorithm. Pixels with flux above a pre-set threshold that are used in the 
clustering are marked in gray. Two clusters are identified with constituent 
pixels marked by either red circles or green squares; the cross marker gives 
the pixel identified as noise, and for each pixel its neighbourhood radius, Tc, 
is indicated. Filled markers indicate core cluster members; empty markers 
give the edge members. The edge member that could belong to either cluster 
(in this case associated with the green cluster) is indicated with a dashed 
neighbourhood radius. 

these parameters can be set optimally a priori to yield a de¬ 
sired output. An advantage of the DBSCAN routine is that it 
does not need a predefined number of clusters, and that the 
clusters can have very irregular shapes — allowing it to en¬ 
compass the spatial distribution of flux from a star on the CCD 
in K2, which depends both on time and position on the focal 
plane. 

The working principle of the DBSCAN is, briefly; (1) se¬ 
lect at random a point, with “points” being the pixels with flux 
above the threshold; (2) check how many other points Nc are 
within the neighbourhood radius of the selected point; (3) if 
Nc > Vmin the point is designated as a core point and the start 
of a cluster, otherwise, if Nc < N^m, it is (at this step) desig¬ 
nated as a noise point', (4) step (2) is now run on points within 
rc of the first point, and so on for their respective neighbour¬ 
hood points, and points are added to the first cluster until no 
more points are density reachable — that is, can be connected 
by a chain of points to the initial point seeding the cluster; (5) 
a point that falls within of a cluster core point, but which 
has Nc < Vmin in its own neighbourhood, is designated as an 
edge point to the cluster. Note that if such an edge point was 
the first considered by the routine, it would have been flagged 
as a noise point, but it will change status later in the routine 
if found within of a cluster core point; (6) when no more 
points can be added to the first cluster, one of the remain¬ 
ing points is selected at random and the steps are run through 
anew. This continues until all points have a designation. 

An illustration is provided in Figure 3 where we set = V2 
pixels and Nc = 3. Each of the clusters returned are seen as a 
target, with the core and edge members of the individual clus¬ 
ters defining the outer boundary of the masks of the targets. 
Edge members within reach of more than one cluster could 
belong to either one of the clusters, and the membership of 
such a point would be determined entirely by the random ini¬ 
tialisation of the routine. Core members, on the other hand, 
can be assigned clusters with full determinism, and will al¬ 
ways group in the same way. We find, however, that the gain 
from a larger mask, which includes both core and edge mem¬ 
bers, outweighs the potential ambiguity and loss of repeata¬ 


bility from including a point that could belong to more than 
one cluster. In order to make the clustering reproducible we 
chose a fixed random seed for the algorithm"^, which ensures 
that the clustering and designation of point will stay the same 
for a rerun with the same settings. 

2.4. Saturated targets 

The setting of masks for saturated targets calls for some ex¬ 
tra attention. The saturation limit is at a Kepler magnitude^ 
of Kp ~ 11.3 (Gilliland et al. 2010a), and saturated targets 
will typically have pixel column trails along which flux spills, 
or bleeds. If the ends of these trails fall outside the mask the 
variability in the flux will be missed, resulting in a high-flux 
truncation of the light curve. The bleed-out is position depen¬ 
dent from the varying pixel sensitivities across the focal plane, 
but in K2 it will also depend on time, because the targets now 
have a time and position dependent movement on the detector. 
This results in a even poorer predictability of the amount of 
bleed-out; we And that bleed-outs generally start for Kp < 9. 

An optimum inclusion of bleed-out trails is particularly dif¬ 
ficult if the trail extends to other targets, or reaches the detec¬ 
tor edge; in such cases a trade-off must be made between the 
amount of flux that can be included from the main target and 
the contamination from neighbouring targets. 

We have implemented the following procedure for dealing 
with saturated targets (see Figure 6): For a given target we 
compute for each pixel-column, using pixels in the target’s 
mask, the ratio between the absolute value of the median of 
the first differences in the flux counts of the pixels and the 
maximum flux count of the pixels. A low value of this ra¬ 
tio indicates a small relative variability in the flux counts, as 
would be the case for a near-constant flux level in a column 
with many saturated pixels. If the ratio is below 1%, and the 
median of the pixel flux counts (still only for the pixels in 
the mask) is equal to or larger than half of the maximum flux 
count for the entire mask, the column is taken as having satu¬ 
rated pixels. The restriction on the median of the flux counts 
ensures that columns containing many pixels with flux lev¬ 
els close to the background, where the relative variability also 
is small, are recognised as non-saturated. For the columns 
identified as saturated we then add pixels to the mask if these 
have counts above the flux threshold used in Section 2.3. This 
could potentially result in pixels belonging to both a saturated 
target as well as a nearby secondary target. 

For the brightest and most saturated targets (Kp < 8), 
with bleed-outs spanning many tens of pixels (e. g., EPIC 
202061312), with much flux contained in diffraction spikes 
on the CCD, and typically with multiple secondary targets in 
the near vicinity the mask should be defined manually — as 
was done, for instance, for the 16 Cyg stars in the nominal 
Kepler mission (Lund et al. 2014; Davies et al. 2015). 

2.5. Separating close targets 

After a set of clusters has been identified there is still the 
possibility that a given cluster might encompass two or more 
stars if these lie close to each other. To separate such tar¬ 
gets in a given cluster we run an algorithm often used in im¬ 
age segmentation problems known as the watershed method 
(Beucher & Lantuejoul 1979; Beucher & Meyer 1993), as im¬ 
plemented in Scikit-image® (see Van Der Walt et al. 2014). 

'* specifically we used the seed 1138 (see Lucas 1977). 

^ nearly equivalent to an R band magnitude (Koch et al. 2010). 

® http: //scikit- image. org/ 
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Figure 4. Illustration of initial steps (Section 2.2-2.4) in selection of pixel masks (here for CO observations of EPIC 202127012). For pixels marked in black no 
flux is collected. Left: Summed image (Section 2.2) with the background mode (Figure 1) subtracted from individual frames; the colour scale is on a logarithmic 
scale going from light (low flux) to dark blue (high flux), and negative levels are truncated to 0. Middle: Collection of pixels with flux levels above a predefined 
threshold (Section 2.3). Right: Clusters identified from running the DBSCAN clustering algorithm, each of which is marked with a distinct colour; filled pixels 
mark core members; circles indicate edge members, and crosses give pixels identified as noise. In this run we used rc = I pixels and Nc = 3. 
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Figure 5. Illustration of final steps (Section 2.5-2. 8) in setting pixel masks and extraction of target positions and fluxes (as in Figure 4 for CO observations of 
EPIC 202127012). For pixels marked in black no flux is collected. Filled pixels mark core members, while circles indicate edge members, and crosses give pixels 
identified as noise. Left: Application of the watershed segmentation algorithm on the clusters identified in the right panel of Figure 4; the colour scale indicate 
the relative negative flux level for each cluster individually (i. e., level do not translate between clusters) after application of a Gaussian 2D filter. Levels go from 
light (low negative flux) to dark blue (high negative flux) and ai‘e rendered on a logarithmic scale. Red circles show the identified local minima which are used as 
markers in the watershed routine. Red lines give the mask borders after the watershed segmentation, as seen the large central cluster has been divided into four 
components. Middle: Masks of the now ten identified targets, each rendered in a different colour. The three brightest targets have been designated with numbers; 
the primary target is star no. 1 (see Figure 9). Right: An example of weights (wi) of pixels within the different masks, here given by the euclidean distance 
between a given pixel to the nearest pixel outside the mask; the scale is again only applicable for the individual masks and do not translate between masks. Black 
lines indicate the mask borders. 


The idea in a watershed algorithm is to find the line(s) be¬ 
tween two or more regions, that may be seen as topographical 
surfaces; considering two neighbouring catchment basins that 
are flooded with water, the watershed will be the line where 
water levels meet. 

To transform the pixel clusters to a topographical relief each 
point in a given cluster is assigned a value from the met¬ 
ric given either by the negative of the euclidean distance to 
the nearest background point (i. e., a point not in the specific 
cluster) or the negative value of its flux. This results in clus¬ 
ter points close to the edge having low negative values while 
central points of the cluster, which are further away from the 
background and generally have higher flux levels, have high 
negative values; this constitutes the catchment basins. If a 
cluster includes two or more stars that are not completely cov¬ 
ered by a common envelope they will have distinctive central 
dips in both the distance and the flux metric. If the stars share 
a common envelope (seen if the stars are very close, or if one 
star greatly outshines the other) the flux metric is superior in 


making distinctive dips for the two (or more) stars; the dis¬ 
tance metric will rather make a central dip for the whole re¬ 
gion covered by the common envelope. As the default we use 
the flux metric to separate targets. 

In the adopted watershed algorithm we first identify the lo¬ 
cal minima of the metric used and then use these as mark¬ 
ers for the centres of the catchment basins which are then 
flooded to find the watershed lines. To avoid noise peaks be¬ 
ing considered as markers we first smooth the surface with a 
2D Gaussian filter, and then locate the most prominent min¬ 
ima — these are then fed as markers to the watershed routine. 
We now have pixel masks for all targets in a given frame. 

Following the method outlined in Sections 2.3-2.5 for set¬ 
ting the pixel mask, we obtain for a sample of 4691 targets 
observed during CO (see Section 4) mask sizes as a function 
of magnitude (see Section 2.6) as given in Figure 7. Flere we 
note a slight gradient in the mask size as a function of an¬ 
gular distance to the space craft bore sight for a given Kpi; 
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Figure 6. Addition of pixels in bleed-out columns for the saturated target 
EPIC 202062417, using short-cadence CO data. Different lines correspond 
to different X-pixel columns, see insert. Red circle (green square) markers 
indicate pixels in (not in) the mask of the main target (see Figure 2). Markers 
filled with black gives the pixels included in the mask after check for satura¬ 
tion bleed-out; in the given case the whole of column 13 is added to the mask 
of the primai 7 target. 



Kpi 


Figure 7. Mask size as a function of the proxy Kepler magnitude Kpi (see 
Equation 2.2) for a sample of 4691 CO targets (see Section 4). Black markers 
indicate the median Kpi for each of the discrete mask sizes; the colour coding 
indicate the angular distance for each target to the space craft bore sight; the 
red dashed line gives the mask sizes from Aigrain et al. (2014). 

this is as expected because the arc traced in apparent move¬ 
ment on the CCD from the roll of the spacecraft increases 
linearly with distance from the bore sight. The scatter in this 
relation will have contributions from the dependence of the 
degree of flux-smearing on the target position on the focal 
plane, and the uncertainty of the determined magnitude. For 
comparison we also show the magnitude dependence of aper¬ 
ture sizes from Aigrain et al. (2014), where the authors use 
circular apertures/masks. 

2.6. Target magnitudes 

Our pipeline enables the extraction of data for multiple tar¬ 
gets in a given frame, but from the information in the target 
pixel data we only have a Kepler magnitude, Kp, for the pri¬ 
mary target. First, however, it should be noted that when tar¬ 
gets were proposed for CO the EPIC did not exist. Therefore, 
a magnitudes given in the EPIC^ for a given CO target is the 

^ http://archive.stsci.edu/k2/epic.pdf 


one provided by the principal investigator proposing the tar¬ 
get, rather than one computed by the Kepler team. For the 
same reason no information is given in the KepFlag entry 
of the EPIC for CO, which is suppose to contain information 
on the data used to compute Kp— one should therefore con¬ 
sult the proposal of a given target to assess how the magni¬ 
tude was constructed. For the sample of targets we have anal¬ 
ysed, viz., the proposal GO1038 (see Section 4), it turns out 
that the EPIC Kepler magnitudes are given by 7-band magni¬ 
tudes from the Two Micron All Sky Survey (2MASS; Skrut- 
skie et al. 2006). To transform these 7-band magnitudes to 
more proper Kepler magnitudes we use the transformation 
from Howell et al. (2012) between Kp and 2MASS 7 - K^ 
colors. 

In order to investigate how parameters such as mask sizes 
and noise measures vary with magnitude we need a way to 
estimate Kp for all targets in a given frame. We approximate 
Kp by the proxy Kepler magnitude Kpi defined as: 

Kpi =25.3-2.5 logio(S), (2.1) 

where “S” denotes the median of the flux time series extracted 
for the target (in units of e“/s). The correspondence between 
Kp and Kpi is shown in the left panel of Figure 8; some of 
the scatter in this relation will originate from the scatter in 
the mask size versus magnitude relation (see Figure 7), and 
a variation in pixel sensitivities between targets. We note 
that Aigrain et al. (2014) defines a proxy Kepler magnitude 
in the same manner and also And an offset of ~25.3. As 
a means of identifying targets falling within a given mask 
(see Section 2.7) we use the USNO-Bl.O catalogue (Monet 
et al. 2003), which is an all-sky catalogue with a complete¬ 
ness down to V = 21. We would like a measure of Kp for all 
targets from the USNO-B1.0 catalogue within a given frame, 
because this is used in the identification of targets (see Sec¬ 
tion 2.7). In addition we can estimate potential contamina¬ 
tions when multiple targets fall within the same mask. For 
each of the identified targets from the USNO-Bl.O catalogue 
that fall within a given mask, we first compute the magnitude 
RB from the USNO-Bl.O R- and B-band magnitudes: 

TJD _ / O.lBmag + O.ORtnag , (Bmag - Rmag) ^ 0.8 

^ - 1 0.2B™ag + 0.8R„ag , (B^ag - Rmag) > 0.8 

According to Brown et al. (2011) this corresponds to the way 
Kepler magnitudes, Kp, are calculated in the KIC if only R- 
and B-band magnitude are available. We define the following 
relation as a second proxy Kepler magnitude: 

Kp2 = RB - 0.33 . (2.3) 

The correspondence between Kp and Kp 2 is shown in the right 
panel of Figure 8. 

The relation giving the Kpi proxy has the smallest amount 
of scatter, and will be used to relate mask sizes and noise mea¬ 
sures to magnitude; Kp 2 will be used in the identification of 
targets, and estimation of contaminations. An advantage of 
having both Kpi and Kp 2 is also that a large discrepancy be¬ 
tween the two measures can be used to identify targets where 
the mask is either much too large or small. 

The offsets for both Kpi and Kp 2 were estimated in a 
Bayesian manner using the affine invariant emcee sampler 
(Foreman-Mackey et al. 2013), and given by the median of 
the marginalized posteriors; the uncertainties were obtained 
from the 68% highest probability density of the marginalized 
posteriors. 
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Figures. Relation between proxy Kepler magnitudes Kpi (Equation 2.1) and Kp 2 (Equation 2.3) and the nominal Kepler magnitude, Kp, computed from 
2MASS J - K, colors. The dashed lines give the 1:1 relation. Eor 0.5 magnitude bins in Kp the median Kp i and Kp 2 values are given by a red marker. 


2.7. Locating main and secondary targets 

In K2 a standardised mask is no longer delivered for the 
main target, at least not in the data releases so far. This, com¬ 
bined with the increased crowding in the equatorial pointing 
and larger frames, makes it more difficult to assert which tar¬ 
get is the main target. Also, the primary target is sometimes 
fainter than secondary targets in the frame. A starting point 
for locating the primary target is the assumption that it is (ap¬ 
proximately) centred in the frame, but still it will be difficult 
to use this exclusively in crowded fields. The target pixel files 
from K2 do deliver a world coordinate system (WCS; Greisen 
& Calabretta 2002; Calabretta & Greisen 2002; Greisen et al. 
2006) metric in the FITS format. The WCS from K2 data 
release 2 is fairly well calibrated (not available in the engi¬ 
neering data) as shown in Figure 9. Here we have marked 
the positions of all targets from the USNO-Bl.O catalogue 
(Monet et al. 2003) from using the WCS transformation to 
pixel coordinates; it is clear that the WCS delivers a reason¬ 
able transformation, generally within two pixels of maxima 
in the summed images. An advantage of our pipeline is that 
masks are defined for all targets in the field (unless they are 
too faint), and so identification can be made at a later stage. 

So far we have made identihcations using the Python mod¬ 
ule Astroquery (Ginsburg et al. 2013), together with the 
WCS module in the Kaptayn package*. This enables us to 
link targets with objects from the USNO-Bl.O catalogue. The 
procedure for the identification of targets is as follows; ( 1 ) 
load sky coordinates for all targets located within a circular 
region which fully contains the frame for the EPIC target in 
question; (2) Transform the sky coordinates of these targets 
to pixel positions using the WCS from the K2 target pixel 
file; (3) Compute a proxy for the Kepler magnitude, Kp 2 (see 
Section 2.6 above); (4) Locate maxima in the summed image 
(Section 2.2) where a 0.5 pixels wide Gaussian smoothing has 
been applied; (5) Compute all X- and T-pixel differences be¬ 
tween the targets and the maxima of the summed image; ( 6 ) 
Run a DBSCAN clustering on all pixel differences within +5 
pixels in both the X- and T-direction, with the clustering pa¬ 
rameters set to rc - 0.5 pixels and Nc < Nmaxia- The value 
of Nc will initially be the number of identified local maxima 
and will iteratively be decreased until a cluster is identified in 
the differences (a “difference cluster”); (7) If more than one 

* http://WWW.astro.rug.nl/so£tware/kapteyn/ 


difference cluster is found within +5 pixels, we choose the 
cluster with the lowest mean Kp 2 magnitude; ( 8 ) As the cor¬ 
rection that should be applied to the WCS transformation we 
take the weighted average of X- and T-differences in the dif¬ 
ference cluster, using one over the Kp 2 magnitudes as weights 
(see Figure 9). Note that this correction only includes trans¬ 
lation, but ignores rotation. This could be amended by using 
a pattern matching algorithm (see, e. g., Spratling & Mortari 
2009), but the offsets are low enough that this can be safely 
omitted; (9) The target from USNO-Bl.O with corrected pixel 
coordinates closest to the median centroids of identified tar¬ 
get clusters is used to identify the cluster. Here we also note 
if other targets fall in the mask of a given target cluster. 


2.8. Target flux and position 

The above steps were concerned with the creation of pixel 
masks for all the different targets in a given frame. For all 
these targets we compute the position and flux as a function of 
time — this will be used later for the correction of the space¬ 
craft roll. We use weights, iP;, on the pixels in the individual 
masks when extracting fluxes and calculating the target po¬ 
sition via the centroid (CEN) with (X,Y) components given 
as 


CENx = 


Zi pmXj 

ZiPm 


CENy = 


Zi PiWjYi 
ZiPm 


(2.4) 


Here p\ denotes the flux for the /th pixel in a given mask, and 
Xi and Yi denote the coordinates of the pixel. 

We have defined the following three pixel weightings wi - 
wy. in wi all weights are set to iPi = 1 , giving an “in/out” 
mask where all pixels have equal weight; in W 2 weights are 
given as the exact euclidean distance between a pixel in the 
mask (Xi, Ti) and the closest background pixel (Xi_b, Ti.b): 


uJi = yjiX-ub - + (Ti,b - Ti)2 . (2.5) 


In 013 weights are set to create a soft edge on the mask, with 
a uniformly weighted central region. This is accomplished by 
dividing every pixel into 11 x 11 subpixels, each of which is 
assigned a weight given by Equation 2.5 and normalized by 
11.3. If this normalized weight is above 1 it is set to 1. This 
results in a mask edge of just over 1 pixels width where the 
weight gradually increases from one over 11.3 to 1. 

We then tested our pipeline using four different schemes: 
( 1 ) wi is used for the extraction of both centroids and flux. 
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Figure 9. CoiTection of the WCS pixel positions. Left: Circles indicate positions of the targets from the USNO-Bl.O catalogue falling in the frame of EPIC 
202127012, where the transformation from sky to pixel coordinates was made using the WCS metric from the K2 pixel target files; target magnitudes Kp 2 (see 
Equation 2.2) are indicated on the color scale and by the marker size. These are plotted on top of the summed image from CO data, with a flux scale going from 
white (low flux) to dark blue (high flux) and rendered on a logarithmic scale. The arrows give the estimated correction to the WCS pixel positions; red crosses 
indicate the identified maxima in the summed image that are used to estimate the correction. The primary target, i. e., EPIC 202127012, is close to the center 
of the frame and emphasised with a white “+” at the corresponding target position. Right: Position dilferences between USNO-B1.0 targets and maxima in the 
summed image. Black crosses are differences tagged as noise in a DBSCAN run on the differences; red circles give the differences of the identified cluster. The 
final magnitude weighted average of the difference cluster is given by the black “+”. 


here all pixels will influence the position and flux with equal 
weight; ( 2 ) W 2 is used for the extraction of both centroids and 
flux. Such a weighting reduces the sensitivity of the extracted 
positions to the exact mask configuration, where, for example, 
a high spatial frequency of the mask from pixels at the mask 
edges could result in an unwanted flickering in the extracted 
parameters for a iPi weighted mask. In many ways this resem¬ 
bles the weighting done naturally when using pixel response 
functions (PRF) to extract centroids (Bryson et al. 2010), but 
without the need to optimise for centroid and total flux using 
a parametrised function. The use of Kepler calibrated PRFs 
is further complicated by the fact that the pointing jitter (now 
with an attitude control bandwidth of 0.02 Hz until C3, where 
it will be increased to 0.05 Hz, which is half of the band¬ 
width of nominal Kepler observation) and systematic move¬ 
ments within a cadence are different in K2 from the nominal 
Kepler mission. Also, for saturated targets the parametrisation 
fails to represent the flux distribution, and the PRFs are only 
defined for long-cadence (LC) observations; (3) 013 is used for 
the extraction of both centroids and flux; (4) W 2 is used for the 
extraction of centroids, while wi is used for the extraction of 
flux. 

We compared the different weighting schemes in the power 
spectrum, in the centroids, and in the final corrected time se¬ 
ries. No noticeable difference was found between schemes 
(1), (3), or (4); with this in mind we opt for the simplest 
scheme, i.e., (1). We also note, first, that scheme (2) gives 
centroid values with a lower point-to-point scatter; this might 
be of use for future (from C3) 2D corrections of short-cadence 
(SC; At 1 min) data, but seems to have little influence on 
the ID correction. Secondly, the flux from scheme (2) retains 
the greatest signal from the spacecraft movement, which is 
evident in the power spectra from this method. This might 
be expected from the peaked flux weighting, which increases 
the sensitivity to the spacecraft movement — a flat weighting 
is preferable for the flux extraction. Considering the choice 
between scheme (1) and (3) for extracting the flux, both of 


which have a predominantly flat weighting, (3) reduces the 
risk of contamination between targets. Scheme (1), however, 
makes it easier to identify any contamination that might occur 
in any case. For simplicity we opt for scheme (1) for both the 
position and flux extraction, but consider using either scheme 
(4), or a combination of (3) and (1), for data from future cam¬ 
paigns. 

In the final step of extracting the flux from the defined 
masks we subtract the background level given as the mode 
of the flux distribution (see Section 2.1), but now only includ¬ 
ing pixels that are unassigned to a target mask. If a target is 
close to the edge, the weighting scheme given by Equation 2.5 
will put highest centroid weight towards the edge. However, 
as long as the flux variability from position correlates with 
the measured centroid, the data from such edge targets should 
still be usable. The same goes for centroids from saturated 
targets, as long as the extracted centroids correlate with the 
relative flux variation they can be used in the correction; the 
absolute position of the target is of little importance. 


2.9. Contamination between targets 

Given that most EPIC frames contain multiple targets we 
compute a few statistics to ascertain the level of contamina¬ 
tion between these targets. As a first metric we compute a 
contamination value, C, as one minus the flux ratio of the pri¬ 
mary and all targets in the mask; 

(J — ^ _ ^primary _ j^Q0.4(m,„,al-mprimary) ^2 6) 

^total 


where mprimary is the Kp 2 magnitude of the brightest target in 
the mask; the total apparent magnitude, mtotaU of the mask is 
given as 

/■ \ 




-2.5 log 


10 




OAlth 


(2.7) 


V i / 


where i runs over the number of identified stars falling within 
the given mask. Secondly, for a given frame we compute a 
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Correlation 

Figure 10. Example of a target correlation matrix. The lower left half of 
the matrix gives the correlation (Pearson’s) between power spectra of targets 
identified in a given frame (see bottom color bar); the top right half gives the 
corresponding minimum distances between target masks (see right color bar). 

target correlation matrix. The lower left half of Figure 10 
shows the correlation between the targets’ power spectra (of 
the cleaned time series, see Section 4); the top right half gives 
the minimum distance between pixels belonging to each target 
pair. This correlation matrix can be used to easily ascertain 
the contamination between targets, and thus when extra care 
should be exercised in assigning a given signal to a given star. 

3. CORRECTING THE LIGHT CURVE 

We have combined the correction part of our pipeline with 
the KASOC filter (Handberg & Lund 2014), meaning that the 
corrections based primarily on the target movement on the 
CCD are combined with corrections made for long and short 
term instrumental trends via the KASOC filter — and this in 
an iterative manner. Briefly, the KASOC filter works by com¬ 
puting two median-filtered versions of the time series with 
different filter windows, and then forms a weighted combi¬ 
nation of the two to correct the time series for instrumental 
features. We refer to Handberg & Lund (2014) for further de¬ 
tails on the KASOC filter. The integration with the KASOC 
filter also includes the iterative use of phase curve corrections, 
which is particularly useful for separating the flux variations 
from the target movement on the CCD from those of stellar 
variability with a strict periodicity (for instance the eclipses 
of a planetary or binary system). 

Below we describe the two possible correction methods in 
the pipeline. For both methods it generally holds true that 
when the amplitude of the underlying stellar signal dominates 
the variations, such as in many Classical pulsators, the correc¬ 
tion of the instrumental signal is less effective. 

3.1. ID correction 

Our ID correction draws heavily on the method presented 
by Vanderburg & Johnson (2014) — which these authors 
called a self-flat-fielding correction — which in turn make 
some use of methods developed for correction of Spitzer data 
(Knutson et al. 2008; Ballard et al. 2010; Stevenson et al. 
2012). These methods use the correlation between flux vari¬ 
ation and position on the CCD (from pixel sensitivity differ¬ 
ences across the CCD), to correct the time series from the 
systematic ~6 hour variability. 


.H 

a 14.55- 
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Figure 11. Illustration of the change in centroid position of star 1 in EPIC 
202127012 on the CCD during the second half (approximately) of CO; time 
is encoded in the colour scale. A sigma clipping has been applied in the time 
domain to remove point far away from the mean centroid position. 

We break the time series into segments that are corrected in¬ 
dividually. This segmentation was implemented because even 
though the movements on the CCD generally follow a well- 
defined pattern (which depend on position on the focal plane), 
there are slow uncorrected drifts as a function of time (see 
Figure 11 for an example of this in CO). Currently, the times 
where breaks are introduced are determined manually, and are 
kept constant for all targets in a given campaign; we provide 
flags for the times where breaks are introduced in the hnal 
output. For CO the time series was broken into two segments, 
namely, a ~13 day segment before and a ~35 day segment af¬ 
ter a safe mode event occurring in CO (lasting approximately 
24 days). 

For each segment, we start by identifying and flagging 
times during which a rapid positional change occurs, as the 
times when the time derivative of the change in centroid posi¬ 
tions, i. e., the velocity, falls outside the range of five times the 
standardised MAD® around the median velocity; these data 
points are then excluded in the following corrections. 

We then apply a principle component analysis (PCA) on 
the X and Y pixel positions of each data segment. Before 
applying the PCA, we select which of the X and Y pixel po¬ 
sitions should be retained in the estimate of the correction; 
only positions with a nearest neighbour at a distance less than 
four times the standardised MAD of all nearest neighbour dis¬ 
tances are retained in the analysis. This is needed as the PCA 
otherwise is very sensitive to outliers. The PCA transforma¬ 
tion of the retained positions to the coordinate system given 
by the two first principal components, helps to ensure that 
the relationship between the transformed pixel positions X' 
and Y' can be described as a single-valued function, which is 
needed for the following steps in the correction. It is, however, 
not always clear if the hrst or the second principal component 
should be used as the regressor. If, for instance, the relation¬ 
ship between X and Y pixel positions could be described as 
Y = X^ (which is already a single-valued relationship) and 
the range in Y values is larger than the range in X values, 
then the first principal component would lie along the ordi¬ 
nate and consequently a transformation making this the re¬ 
gressor, that is, Y ^ X' and X —» Y', would result in the 

® which we define as 1.4826 times the MAD, with the constant being the 
scale factor that makes the MAD a consistent estimator for the standard de¬ 
viation of a normally distributed random variable. 
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Figure 12. Illustration of ID con'ection for star 2 (TYC 1329-1325-16) in 
EPIC 202127012 (see Figure 5). Top: Uncorrected time series in flux relative 
to the median. Middle: Time series corrected for long term, short term, and 
positional trends. Bottom: Time series con'ected for long term, short term, 
positional trends, and the phase curve constructed from the middle panel (see 
Figure 13). 

multi-valued relationship X' - ± VF'. We decide which of 
the principal components is the best regressor, by running a 
LOWESS (Cleveland 1981, 1979) filter on the transformed 
pixel positions, using in turn the two principal components 
as regressors and computing the summed squared difference 
(X^) between the filtered and un-filtered data. The principal 
component with the lowest is used as the regressor. 

In the transformed coordinates we compute a smoothed ver¬ 
sion of the Y' vs. X' positions by again applying a LOWESS 
filter. We then calculate the curve length s along this hltered 
relationship as 



using finite differences as the derivative of the curve and cu¬ 
mulatively integrate for the curve length using the composite 
trapezoidal rule. The curve length serves as the new ID rep¬ 
resentation of the 2D stellar position on the CCD. 

The correction to the light curve is then found from a 
LOWESS filtering of the relative flux as a function of curve 
length, thereby capturing the average positional dependency 
of the flux level. In the correction step we make sure to re¬ 
move any long term trends in the light curve to obtain the 
relative flux, as such changes will correlate poorly with the 
movement on the CCD. Some of the long term variability 
could in principle be caused by the slow drift of the target 
on the CCD (Eigure 11), but could just as well be a separate 
instrumental effect — for instance from focus changes caused 
by heating of the mirror. The background flux level could also 
enter in the long term variability if this is not corrected for 
properly during the light curve extraction. We make the cor¬ 
rection iteratively with a better separation between long term 
and positional dependent variations as the outcome. 

In Eigure 12 we give an example of the ID correction for 
the CO observations of star 2 (TYC 1329-1325-16) in EPIC 
202127012 (see Eigure 5); here we further include in the 
KASOC filter a correction for the dominating periodic sig¬ 
nal by iteratively correcting by the phase curve of this signal 
(see Eigure 13). The input period for this correction was de¬ 
termined from the autocorrelation function of the time series. 


Figure 13. Phase curve for star 2 (TYC 1329-1325-16) in EPIC 202127012 
(see Figure 5). Top: Phase curve of uncorrected time series with the flux 
relative to the median (top panel in Figure 12). Bottom: Phase curve (black) 
of time series connected for long term, short term, and positional trends (mid¬ 
dle panel in Figure 1 2). From the hlack points we form the red phase curve 
via a moving median smooth; this smoothed phase curve is used in the itera¬ 
tive correction performed by the KASOC filter to obtain the bottom panel of 
Figure 12 (Handberg & Lund 2014). 

3.2. 2D correction 

In our second approach we make a 2D histogram of the 
measured X and Y centroids of the star. In each bin we com¬ 
pute the median of the relative flux of points falling in that bin; 
this will capture the positional variation in the relative flux in 
a robust manner. In the reconstruction of the flux variability 
in the time domain we use a rectangular bivariate linear spline 
to interpolate between the bin centres. The reason for going to 
2D is that flux variations also occur in the direction perpendic¬ 
ular to the overall roll motion (see Eigure 14 for an example). 
Such variations are unresolved in the ID treatment, because 
the scatter in the relative flux versus curve length is reduced 
to a line; one would therefore suspect that the ID treatment 
will leave residuals in the corrected light curve that could be 
accounted for in a 2D treatment. 

The most difficult aspect of the 2D binning is the choice 
of bin size. If the bins are too small the reconstruction of 
the flux variation will be noisy; one is effectively overfitting. 
On the other hand, if the bins are too large the reconstructed 
variation will be a smoothed version of the underlying varia¬ 
tion, and significant residuals may be left in the light curve. 
The sensitivity to the bin size is largest for long-cadence (EC; 
At 29.4 minutes) observations due to the smaller number of 
data points, and consequently larger variance on the median. 
The method is thus best suited for SC observations where the 
exact bin size is less influential on the reconstructed instru¬ 
mental variability. 

Depending on the shape of the stellar movement in the X-Y 
plane it can be advantageous to transform the movements to 
a (predominantly) horizontal variation before making the 2D 
histogram — this could, for instance, be achieved by dividing 
the centroid Y-components with a smoothed fit to the move¬ 
ment in order to reduce the span of the histogram, and thus 
the size, of the histogram. 

So far, in our testing of this method on SC data we have 
not found it to be preferable to the ID method. This is likely 
a result of the current value of the attitude control bandwidth 
of 0.02 Hz (50 s), which is very close to the SC integration 
time. Because of the allowed amount of movement within a 
SC integration this will lead to a larger smear and variance in 
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Figure 14. Centroid positions for star 1 in EPIC 202127012, with the relative 
flux given by the colour bar. The surface shows the interpolated relative flux 
from the medians of the 2D histogram. Here we used 20 bins in both the 
X and Y-directions. The dashed line gives a smooth version of the overall 
positional variation, that could be used to correct the positions to a more one 
dimensional variability. 

the bin medians. We expect this to improve from C3 onwards 
when the bandwidth will be increased to 0.05 Hz. 

4. PIPELINE TEST 

As a test of the pipeline we analysed the pixel frames of 
the 452 LC targets in the CO proposal 000118*° (“Galactic 
Archaeology on a grand scale”; PI; Stello, D.). We also anal¬ 
ysed the known transiting system WASP-85 (see Brown et al. 
2014), which was observed in SC during Cl. 

Because our pipeline enables the extraction of data from 
several targets in a given frame, we ended with a total of 4691 
targets from the GO0118 proposal, and thus light curves to 
analyse — this corresponds to a gain in the amount of data 
by a factor of ~10.4, and this even when adopting a limit on 
the minimum number of pixels in a mask of 8 before a target 
would be considered. 

4.1. The power spectrum 

After data were extracted using the K2P^ pipeline, they 
were corrected with the KASOC pipeline (Handberg & Lund 
2014) using the ID correction method described in Sec¬ 
tion 3.1, and a frequency power density spectrum was cal¬ 
culated. The ID correction removes most of the signal from 
the spacecraft roll, but residual spikes still often appear at har¬ 
monics of ~47.2281 pHz. These spikes are damaging to any 
automated search for power; to remedy this we tested the ef¬ 
fect of “cleaning” the residual spikes using a prewhitening 
routine (see, e. g., Ponman 1981; Belmonte et al. 1991), which 
removes all significant power in a +1 pHz window around the 
residual spikes. For every window, oversampled by a factor 
of ten, we iteratively remove the frequency with the highest 
power-to-background ratio (PBR; the background is calcu¬ 
lated as the median of the power within the window multiplied 
by ~ 1.42, which is the conversion factor between the median 
and the mean for a ;if 2 ‘distribution) if this ratio has a false- 
alarm detection probability less than 10% (Scargle 1982; Ap- 
pourchaux 2004; Lund et al. 2012). Besides the signal from 

*** http: //keplerscience. arc .nasa.gov/K2/docs/Campaigns/ 
C8/G00118_Stello.pd£ 


the spacecraft roll, we also see a signal at ~5.92/iHz (equiv¬ 
alent to ~L96 days); we suspect this signal originates from 
the periodic momentum dumps of the reaction wheels through 
thruster firings, which happens every two days (Howell et al. 
2014), and enters the power spectrum via the spectral window 
(see right panel in Figure 15). The left panel of Figure 15 
shows the efficiency of the procedure for removing the resid¬ 
ual instrumental peaks from the power spectrum. Instrumen¬ 
tal signals can still be seen in the cleaned power spectrum, 
but now with amplitudes low enough to allow the detection of 
asteroseismic signals. 

4.2. High frequency photometric variability 

To detect stellar oscillations in the frequency power spec¬ 
trum it is important that the white (shot) noise level does not 
dominate the signal — this is especially true for the detec¬ 
tion of low amplitude stochastic solar-like oscillations. It is 
thus of interest to know the characteristic levels of the short 
time scale (high frequency) noise in K2 LC data as a func¬ 
tion of Kepler magnitude, or in our analysis Kpi. We note, 
however, that a measure of the high-frequency noise is not 
necessarily tantamount to a measure of the constant power 
spectral density white noise level. For each of the targets in 
the sample we computed a proxy for the instrumental variabil¬ 
ity using the median of the absolute point-to-point flux differ¬ 
ence of the KASOC corrected and cleaned time series; this 
proxy was coined the median differential variability (MDV) 
by Basri et al. (2013). As detailed in Basri et al. (2013) the 
MDV will on short time scales (with point-to-point being the 
shortest) be most sensitive to high frequency noise; variabil¬ 
ity on time scales longer than the LC sampling of ~29.4 min¬ 
utes will on the other hand contribute very little to the MDV. 
To enable a comparison of the MDV for K2 with that of the 
nominal Kepler data, we compute the point-to-point MDV for 
the set of 6210 LC targets from the Kepler APOKASC (Pin- 
sonneault et al. 2014) data release 1 sample. In the KASOC 
Altering we used the following Alter settings: riong = 3 days 
and Tshort = 0.25 days (see Handberg & Lund (2014) for de¬ 
tails on these settings); for the APOKASC targets we used 
Bong = 30 days, which is a too long time scale for the du¬ 
ration of the K2 light curves. Figure 16 shows the resulting 
MDV measures as a function of magnitude for both the K2 
and nominal Kepler targets. Our results from the nominal Ke¬ 
pler data are in overall agreement with the results presented 
in Basri et al. (2013). We And that at Kpi < 10 the ratio 
between the median MDV in K2 and nominal Kepler falls be¬ 
low ~2, and increases to ~10 at Kpi ~ 14. For the K2 values 
we further see an indication of a slight gradient in the MDV 
with angular distance to the bore sight for a given magnitude, 
which might be expected from the larger systematic imprint 
on the light curve further away from the bore sight. Com¬ 
paring our values to those from Aigrain et al. (2014) (their 
Table 1, 3-pixel radius masks) we And, as evident from Fig¬ 
ure 16, an excellent agreement. We also computed point-to- 
point MDVs for our target sample as corrected in Vanderburg 
(2014)’*, and And that the median binned values generally 
agree within a factor of two. For these comparisons it should 
be noted that we are unaware if the authors of the comparison 
studies checked the sources of the Kepler magnitudes from 
the TPD, entering the magnitude calibration, and how they 
possibly transformed these. 

https://www.cfa.harvard.edu/-avanderb/k2.html 
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Figure 15. Left: Effect of cleaning residual peaks from the space craft roll. The black curve gives the mean of the 4691 power spectra; each of which have been 
divided by a 5 /^Hz window mnning median smooth to convert to a power-to-background ratio (PER). Here the residuals from the space craft roll is clearly visible 
at integers of ~47.2281 /iHz. The red curve gives the spectrum after prewhitening the residual peaks. Right: Average spectral window for the 4691 time series, 
normalised to 1 at zero frequency. In the averaging of the power spectmm and spectral window both of these were interpolated onto a common frequency scale 
using a smoothing spline interpolation. 



Figure 16. Proxy for the short time scale (high frequency) noise, given by 
the point-to-point median differential variability (MDV), as a function of the 
proxy Kepler magnitude Kpi. Circular colored markers (blue to green) give 
the estimates for the K2 sample, with the color scale indicating the angular 
distance to the space craft bore sight (see color bar); circular black mark¬ 
ers give the estimates for APOKASC LC targets from the nominal Kepler 
mission (for these targets their actual Kepler magnitudes were used); red cir¬ 
cular markers give the median MDVs for both K2 and nominal Kepler values 
in 0.5 magnitude bins; square black markers give the median MDVs for K2 
from Aigrain et al. (2014). 

We note that a comparison of MDVs can not be seen di¬ 
rectly as a comparison of the quality of the light curves and 
the corrections applied, and should be evaluated in the context 
for which the corrected data is intended. A measure like the 
MDV will depend strongly on the choice of free parameters in 
the correction. In Vanderburg (2014) the CO light curves were 
processed with the intent of detecting planets. Here the light 
curves are corrected individually in three segments; the values 
from the mask with the lowest 6-hour scatter were adopted, 
trying 20 masks of different sizes; the ht to the flux versus 
curve length was made with a finer binning than in Vander¬ 
burg & Johnson (2014) — all of these tweaks will conspire to 
giving a lower point-to-point scatter suited for planet detec¬ 
tion. 


4.3. Target examples 


In the following we show a few examples of the many tar¬ 
gets among the 4691 that display astrophysical signals. We 
note that we have not performed a systematic assessment of 
the targets. 

Figure 17 gives an example of three red giant targets, show¬ 
ing low-frequency solar-like oscillations. The levels of power 
here suggest that for CO it should in general be possible to de¬ 
tect oscillations in red giants, and obtain average asteroseis- 
mic measures such as Av and Vmax- We note that for the three 
cases show in the Figure 17 the Kpi magnitudes were all <11, 
and the high frequency noise in the time domain (approxi¬ 
mated by the MDV) is, according to Figure 16, only about 
2-3 times higher in K2 compared to the nominal Kepler mis¬ 
sion. If we assume the MDV scales linearly with the shot 
noise, this translates to a factor of 4-9 times higher noise in 
the power density spectrum compared to the nominal Kepler 
mission. For a systematic analysis of the CO red giants we 
refer to Stello et al. (in prep.). 

Figure 18 gives an example of three Classical pulsators 
showing, predominantly, d-Scuti-like oscillations. For this 
type of star the noise introduced in K2 is clearly of little im¬ 
portance due to the large amplitudes of the oscillations. 

In Figure 19 we present the SC data and corrected phase 
curve for WASP-85 (Brown et al. 2014), having the EPIC 
number 201862715. The raw data for this system shows a 
clear modulation from surface spots, together with the smaller 
amplitude instrumental modulation. In the reduction of this 
light curve we used the information of the orbital period of the 
system in the iterative correction performed by the KASOC 
filter. The bottom panel of Figure 19 gives the phase curve at 
the final iterative step. 

In Figure 20 we present the light curves for a few targets 
showing distinct eclipse-like features. We note that in none of 
these cases did the target correspond to the target associated 
with the respective EPIC numbers, and they would thus have 
been missed had only the primary target been extracted. 

5. CONCLUSION 

We have presented our version of a K2 data analysis 
pipeline, with the objective that it should be fully automatic 
and work robustly. Prom the analysis of LC targets from the 
CO proposal GOOl 18 we found the the pipeline indeed works 
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Figure 17. Example power spectra of red giant targets, from GOOl 18 observed during CO, showing low-frequency solar-like oscillations. All of the targets show 
here are the primary ones for the respective EPIC numbers. 
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Figure 18. Example power spectra of targets, from GOOl 18 observed during CO, showing classical (5-Scuti-like oscillations, and possibly some y Dor oscillations. 
Only in the case of EPIC 202086286 (middle panel) does the tai'get correspond to the primary target. 
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Figure 19. SC data for WASP-85 obtained during Cl. Top: Raw light cuiwe 
for WASP-85 showing spot modulation and distinctive transits. Bottom: 
Phase curve of the connected light curve for WASP-85; the red cuiwe gives 
the smoothed phase curve that would have been used in the final correction 
of the light curve. 

very robustly, and was able to separate close targets and ex¬ 
tract data for multiple targets in a given pixel frame. This 
resulted in an increase in the number of available light curves 
by a factor of ~10.4 for CO, and will naturally vary with the 
amount of crowding in the different campaigns. Given the 
large increase in number of potential targets for each assigned 
EPIC, it needs to be settled how these new targets might be 
named and identihed in other studies. 

Concerning the construction of pixel masks we note that 


many of the published studies of K2 data apply circular 
masks. But, the flux distribution for a target in K2 is gener¬ 
ally far from circular and symmetric, especially if a summed 
image is used. If a circular mask is used it needs to be large 
enough to encompass the movement of the target on the CCD; 
this in turn considerably increases the risk of contamination 
from other nearby targets. The use of clustering of pixels 
from the summed image for dehning the masks better approx¬ 
imates the actual flux distribution of the target. For later ver¬ 
sions of the pipeline we will investigate in greater detail if 
any weighting of the pixel masks can lead to a reduction of 
the high-frequency noise, e. g., as measured via the point-to- 
point MDV. In relation to this we will also test further the po¬ 
tential impact of a high spatial frequency of the derived pixel 
masks. More effort needs to be invested in improving the cor¬ 
rection of instrumental trends via the 2D method. When data 
from C3 becomes available, where the hne pointing of the 
spacecraft should be improved, we will revisit this method in 
more detail. This could also include an implementation of the 
procedure outlined in Kjeldsen et al. (2013a,b). We will also 
continue to try and improve the ID correction, that in our tests 
still seems to leave artefacts at harmonics of ~47.2281 fiHz. 
A better removal of these artefacts is clearly needed if an au¬ 
tomatic search of asteroseismic power is desired, and simply 
masking the peaks in the power spectrum will only have a lim¬ 
ited impact if the effect of the spectral window is neglected. 
Our attempt at cleaning the instrumental peaks did improve 
the power spectrum, but still could not fully remove the in¬ 
strumental peaks and the window function persisted, which 
might be expected from cleaning a highly non-sinusoidal sig¬ 
nal. As part of the correction we will look into measures other 
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Figure 20. Phase curves for a subset of targets exhibiting transit-like features, rendered over two full phases with the first primary transit feature centred at 0.5; 
the approximate period for each target is given in the label of the abscissa. For the purpose of this illustration periods were estimated by eye from the light curves. 
In each panel we have indicated what we find the most likely USNO-Bl.O number for the target. The EPIC number given in each panel if for the star in whose 
pixel frame the target was found. In neither of the cases show did the target con'espond to the main target associated with the EPIC number. 


than the centroids for the position of the stars on the CCD; this 
could include the construction of a mean relative movement 
on the CCD from combining the measures of all targets in 
a given pixel frame. Also of interest is whether the house¬ 
keeping data from the Kepler spacecraft can be incorporated 
for a better overall positional correction. We will attempt to 
improve the treatment of saturated targets, which are difficult 
to deal with via the DBSCAN clustering routine. Aspects that 
should be improved here are, for instance, a better separation 
of targets that fall within or close to high-flux pixels from a 
saturated target. 

We note that our method could potentially be used for dense 
fields including stellar clusters, and could also be applied to 
super-stamps from K2 and the nominal Kepler mission, as 
well as the upcoming TESS (Ricker et al. 2014) and PLATO 


2.0 (Rauer et al. 2013) missions'^. During the development 
of K2P^ we tested the application of the pixel clustering on 
every time step for the pixel frame of a given target rather 
than using the summed image. A complication of this method 
over using the summed image is that the number of targets 
identified in the pixel frame varies slightly with time due to 
noise, and the cluster number of a given targets will also vary 
in time. From tests of this version of the pipeline on K2 en¬ 
gineering data, we found that using the pixel clustering on 
every time step could enable the detection of asteroids and/or 
comets (or other unidentified objects) as they passed through 
the pixel frame (see Szabo et al. 2015, for an analysis of as¬ 
teroids found during the K2 engineering run). When scatter 
plotting centroid estimates for all identified targets (at a given 

TESS: Transiting Exoplanet Survey Satellite; PLATO: PLAnetary Tran- 
sits and Oscillation of stars. 
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time step) against time, moving targets such as asteroids make 
clear centroid trails that deviate from the horizontal trails of 
quasi-stationary targets such as stars. Identification and analy¬ 
sis of such centroid trails could lead to the detection and track¬ 
ing of hitherto unknown asteroids/comets. 
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